Handwritten Text Line Segmentation by Clustering with Distance Metric Learning
نویسندگان
چکیده
Separating text lines in handwritten documents remains a challenge because the text lines are often ununiformly skewed and curved. In this paper, we propose a novel text line segmentation algorithm based on Minimal Spanning Tree (MST) clustering with distance metric learning. Given a distance metric, the connected components of document image are grouped into a tree structure. Text lines are extracted by dynamically cutting the edges of the tree using a new objective function. For avoiding artificial parameters and improving the segmentation accuracy, we design the distance metric by supervised learning. Experiments on handwritten Chinese documents demonstrate the superiority of the approach.
منابع مشابه
Handwritten Chinese text line segmentation by clustering with distance metric learning
Article history: Received 7 August 2008 Received in revised form 21 November 2008 Accepted 20 December 2008
متن کاملSegmenting Arabic Handwritten Documents into Text lines and Words
In this paper, we present a method for segmenting Arabic handwritten documents into text lines and words. Text line segmentation is addressed by a well-known technique, the horizontal projection profile, in which autocorrelation is used to enhance the self similarity of this profile. This technique promotes the estimation of text line spacing. Word extraction is based on an adaptation of a know...
متن کاملSouth Indian Tamil Language Handwritten Document Text Line Segmentation Technique with Aid of Sliding Window and Skewing Operations
In document image analysis, Text line segmentation is one of the key components. The segmentation logic presents essential information about skew correction, zone segmentation, and character recognition. The method of document image segmentation into text lines for printed text has seen numerous contributions from fellow research scholars, yet there is scope for tremendous improvement. The key ...
متن کاملText line and word segmentation of handwritten documents
In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and...
متن کاملComponent-based Segmentation of Words from Handwritten Arabic Text
Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words se...
متن کامل